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As part of a systematic algorithm study, we present first results on a performance comparison between a 
multibosonic algorithm and the hybrid Monte Carlo algorithm as employed by the SESAM collaboration. The 
standard Wilson fermion action is used on 32 x 16'^ lattices &t (3 — 5.5. 



1. Introduction 

The past six years have witnessed a number 
of large simulations of full QCD with two degen- 
erate sea quarks and Wilson like actions. All of 
them were based on the hybrid Monte Carlo algo- 
rithm (HMC) which has been systematically im- 
proved over the years. These simulations, how- 
ever, albeit being milestones for lattice QCD in 
the pre-Teracomputing age , suffer from severe 
restrictions that prevent them to be truly realis- 
tic: we need to go to lighter quark masses and 
operate with three flavors with masses eventually 
approaching the physical mass. 

This situation has been motivating research 
and development of alternative algorithms of the 
multibosonic variety (MBA) that promise in prin- 
ciple to overcome some of the restrictions of HMC 
mentioned. It has been shown that MBA can in- 
deed be geared to reach the efficiency of HMC [|| 
and encouraging results have been achieved with 
MBA for the treatment of finite density QCD [||, 
supersymmetric field theories Q] as well as the 
case of three dynamical fermion flavors |^ . 

In this contribution we wish to describe our 
benchmarking of MBA, performed in the quest 
to achieve lighter quark masses than previously 
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possible in the SESAM and other full QCD simu- 
lations. For details regarding the HMC algorithm 
used see ref. . 

2. Gauging performance 

The efficiency of an algorithm is most suitably 
quoted as the computational cost, E^^^, for pro- 
ducing a statistically independent vacuum gauge 
field configuration within the Markov process, ex- 
pressed in units of the number of necessary Dirac- 
matrix vector M ■ (j) multiplies: 

- 2t.„,£;,,,, , (1) 

with Ti„t being the integrated autocorrelation 
time measured in units of update sequences called 
"trajectories" each having a cost of E^traj- Being 



Volume 


: n^i2x 16^, /3 


= 5.5 


K 


a/fm ro/a 


m^/mp 


0.159 
0.160 


0.141 4.39(3) 
0.117 4.89(3) 


0.800(10) 
0.670(14) 


Table 1 
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particularly interested in the quark mass depen- 
dence of the algorithms, we chose for our analysis 
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two different quark mass settings, as given in ta- 
ble ||. Note that both quark masses are from the 
regime of current fuU QCD simulations. 

3. Defining tiie MBA algoritiini 

The parameter space of MBA algorithms is 
large w.r.t. the HMC situation: MBA optimiza- 
tion proceeds (a) by finding a not too high or- 
der polynomial approximating the inverse of the 
fermion matrix at the given working point, (b) 
by devising an efficient "trajectory" in form of an 
updating sequence for the degrees of freedom and 
(c) by exploiting a method to correct for the re- 
maining systematic error. Today, all MB As use a 
noisy correction step for (c). For a detailed dis- 
cussion see e.g. ref. 10,^. 

For the construction of appropriate polyno- 
mials we followed the prescription proposed in 
ref. which employs the GMRES algorithm 

to find a non-hermitian approximation of the in- 
verse Wilson matrix with ultraviolet-filtering on 
a small sample of thermalized gauge field config- 
urations. This was done to minimize the polyno- 
mial order and hence the number of boson fields. 
Quadratically optimized polynomials need 
about 2 — 3 times more boson fields for similar 
acceptance rates. 

The other key element for the optimization of 
an MBA is the choice of "trajectory", T, i.e., the 
basic updating pattern. For that matter, one has 
to devise an appropriate blend of gauge and boson 
field updates. We have chosen the "trajectory" to 
be 

T = COgiObOgfHl,iOgObfOg, (2) 

with C being the final accept /reject correction 
step, Og (Ob) a gauge (boson) field overrelax- 
ation sweep and Hb a boson field global quasi- 
heat bath [|||. 

The number of boson fields is tuned such as 
to ensure a good acceptance rate in the ball-park 
of 60 — 70%. As a result we obtained the values 
quoted in table ||. 

4. Results 

The determination of accurate autocorrelation 
times. Tint, needs very long Monte Carlo time 



K n P,,, A/3 

0.159 24 mWo 0.876 
0.160 42 65.9% 1.763 
Table 2 

Number of boson fields n required for the MBA 
with the resulting acceptance rates at the two 
working points and the shift of the gauge cou- 
pling used for UV-filtering. 



histories. The number of trajectories underly- 
ing the present analysis together with our results 
for the efficiencies, £'i„d, are collected in table ^. 
At K — 0.160, we have also used the topological 
charge, Q, as a monitor for autocorrelations. 



K 


Observ. 


HMC 


MBA 


0.159 


Plaq. 


0.55816(4) 


0.55804(7) 




(amrr) 


0.4406(33) 


0.4488(37) 




(amp) 


0.5507(59) 


0.5635(83) 


0.160 


Plaq. 


0.56077(6) 


0.56067(5) 




(am^) 


0.3041(36) 


0.3184(68) 




{amp) 


0.4542(77) 


0.4674(81) 



Table 3 



Average plaquette, non-singlet pseudo-scalar 
(am-n-) and vector (amp) meson masses. 



As a check for our simulation code we have 
computed the average plaquette and (non-singlet) 
pseudo-scalar and vector meson mass values, as 
shown in table ^. They agree within errors. 

i?i„d(plaq) has been obtained from direct inte- 
gration of the plaquette autocorrelation functions 
while for the masses the jackknife blocking tech- 
nique has been applied. The latter method only 
allows for a rather crude estimate of autocorre- 
lation times, so that we concentrate mainly on 
the plaquette in the following, where we have safe 
control over statistical errors: 

The statistics at the point k = 0.159 covers 
about 320 autocorrelation times for the HMC 
and 140 for the MBA. At the working point 
K = 0.160, the HMC series has a length of 140 
and the MBA of 160 times Ti„t. Hence, the statis- 
tics is sufficient for the results to be conclusive. 



3 





Alg. 


# trajectories 


ri„t(plaq) 


Ei„d(plaq) 


Ei„d(m^) 


Ei„d(mp) 


E.„,(Q) 


0.159 


HMC 


3521 


11.0(0.4) 


712(26)10^ 


< 1620 10^ 


< 1620 10^ 






MBA 


6217 


44.7(3.4) 


786(60)103 


528 10^ 


704 10^ 




0.160 


HMC 


5003 


34.1(3.1) 


572(52)10-* 


350 10-* 


894 10* 


850(34) 10* 




MBA 


9910 


61.1(4.1) 


216(14)10^ 


102 10"^ 


70 lO'^ 


324(22) lO'^ 



Table 4 



Lengths of MC runs, estimated plaquette autocorrelation times and total cost, i?i„d, for producing one 
configuration decorrelated w.r.t. various observables, from HMC and MBA. 



From table ^ one finds that _Ei„d(plaq) is about 
equal for MBA and HMC at the larger quark 
mass. The total effort in the case of the HMC 
increases by a factor of 8 when stepping down 
in quark mass, while at the same time the cost 
for MBA 'only' rises by a factor of 2.7. Conse- 
quently, the MBA has become almost a factor of 
three more efficient than HMC. 

5. Conclusions 

We have demonstrated in a realistic setting 
that MBA-type sampling techniques are not at all 
inferior to the state-of-the-art HMC techniques 
and appear to show superior scaling behavior in 
quark masses. We have established here a better 
scaling behavior of MBA, leading to a gain factor 
of almost three in favor of the particular MBA 
variant at currently reachable quark masses. 

It remains to be seen to what extent this 
encouraging result will pertain in the regime 
of smaller quark masses. The ultimate aim is 
to push full QCD simulation with Wilson like 
fcrmions below the point TO^/rrip < 0.5. 

Another point to check is whether the MBA 
variant considered in this paper with its non- 
hermitian approximation will remain operational 
for practical simulations in the deeper chiral 
regime. Because the GMRES method suffers 
from numerical instabilities for increasing orders 
n, one might have to take recourse — even on 
32 X 16"^ lattices — to 128-bit precision. A more 
extended MBA study in this direction is in prepa- 
ration 

Acknowledgments 



potential program under HPRN-CT-2000-00145 
Hadrons/Lattice QCD, and W.S. by the DFG 
GraduiertenkoUeg "Feldtheoretische Methoden in 
der Elementarteilchentheorie und Statistischen 
Physik" . The numerical productions were run on 
the APE- 100 systems installed at NIC Zeuthen. 

REFERENCES 

1. Th. Lippert, these proceedings; A. Ukawa, 
ibid; H. Wittig, ibid. 

2. C. Alexandrou et al, Phys. Rev. D 61 (2000) 
074503. 

3. S. Hands et al, Eur. Phys. J. C 17 (2000) 
285. 

4. I. Campos et al. [DESY-Miinster CoU.], Eur. 
Phys. J. C 11 (1999) 507. 

5. C. Gebert, F. Farchioni, I. Montvay, and 
W. Schroers, these proceedings. 

6. Th. Lippert, Habilitationsschrift, Univ. of 
Wuppertal, 2001. 

7. P. de Forcrand, Parallel Computing 25 (1999) 
1341. 

8. W. Schroers, PhD-Thesis, Univ. of Wupper- 
tal, 2001. 

9. A. BorreUi et al, Nucl. Phys. B 477 (1996) 
809. 

10. P. de Forcrand, Nucl. Phys. Proc. Suppl. 73 
(1999) 822. 

11. I. Montvay, Comput. Phys. Commun. 109 
(1998) 144. 

12. P. de Forcrand, Phys. Rev. E 59 (1999) 3698. 

13. W. Schroers, N. Eicker, M. D'Eha, Ph. de 
Forcrand, C. Gebert, Th. Lippert, I. Mont- 
vay, B. Orth, M. Pepe, and K. Schilling, in 
preparation. 



N.E. is supported under DFG grant Li701/3- 
1, M.P. by the European Community's Human 



